Towards an Open Source Toolkit for Building Record Linkage Workflows

نویسندگان

  • Marco Fortini
  • Monica Scannapieco
  • Laura Tosco
  • Tiziana Tuoto
چکیده

Record linkage has been subject of research for several decades, and a huge number of record linkage solutions have been proposed, based on probabilistic and empirical paradigms. However, record linkage is a complex process, for the execution of which one single technique is often not enough; it can be seen as composed by distinct phases, each requiring a specific technique and depending on given application and data requirements. Due to such complexity and application dependency, in this paper we propose a toolkit for record linkage, called RELAIS. The toolkit is based on the idea of choosing the most appropriate technique for each phase, and of combining such techniques in a dynamically built record linkage workflow. A real case study validates the RELAIS idea and provides a methodological pattern for driving the design of a record linkage workflow on the basis of the requirements of a real application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Open Source Toolkit for Quantitative Historical Linguistics

Given the increasing interest and development of computational and quantitative methods in historical linguistics, it is important that scholars have a basis for documenting, testing, evaluating, and sharing complex workflows. We present a novel open-source toolkit for quantitative tasks in historical linguistics that offers these features. This toolkit also serves as an interface between exist...

متن کامل

A Comparison of String Distance Metrics for Name-Matching Tasks

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators , token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid s...

متن کامل

Open source tools and toolkits for bioinformatics: significance, and where are we?

This review summarizes important work in open-source bioinformatics software that has occurred over the past couple of years. The survey is intended to illustrate how programs and toolkits whose source code has been developed or released under an Open Source license have changed informatics-heavy areas of life science research. Rather than creating a comprehensive list of all tools developed ov...

متن کامل

Performing radiation therapy research using the open-source SlicerRT toolkit

Radiation therapy (RT) is a common treatment option for a wide variety of cancer types. Despite significant improvements in this technique over the past years, software tools for research in RT are limited to either expensive, closed, proprietary applications or heterogeneous sets of open-source software packages with limited scope, reliability, and user support. Our SlicerRT toolkit aspires to...

متن کامل

The GSI Plug-In for gSOAP: Building Cross-Grid Interoperable Secure Grid Services

Increasingly, grid computing is becoming the paradigm of choice for building large-scale complex scientific applications. These applications are characterized as being computationally and/or data intensive, requiring computational power and storage resources well beyond the capability of a single computer. Grid environments provide distributed, geographically spread computing and storage resour...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006